Using Features from Topic Models to Alleviate Over-Generation in Hierarchical Phrase-Based Translation
نویسندگان
چکیده
In hierarchical phrase-based translation systems, the grammars (SCFG rules) have over-generation problem because we can replace the non-terminalX with almost everything without knowing the syntactic or semantic role ofX . In this paper, we present an approach that uses topic models to learn the distributions for non-terminals in each SCFG rule, based on which we further derive static features for the discriminative framework of statistical machine translation. Experimental results on three corpora show that we can obtain some gains in BLEU by using these features derived from topic models to alleviate the overgeneration problem in hierarchical phrase-based translation.
منابع مشابه
A Topic Similarity Model for Hierarchical Phrase-based Translation
Previous work using topic model for statistical machine translation (SMT) explore topic information at the word level. However, SMT has been advanced from word-based paradigm to phrase/rule-based paradigm. We therefore propose a topic similarity model to exploit topic information at the synchronous rule level for hierarchical phrase-based translation. We associate each synchronous rule with a t...
متن کاملMaximum Entropy Based Phrase Reordering for Hierarchical Phrase-Based Translation
Hierarchical phrase-based (HPB) translation provides a powerful mechanism to capture both short and long distance phrase reorderings. However, the phrase reorderings lack of contextual information in conventional HPB systems. This paper proposes a contextdependent phrase reordering approach that uses the maximum entropy (MaxEnt) model to help the HPB decoder select appropriate reordering patter...
متن کاملTraffic Scene Analysis using Hierarchical Sparse Topical Coding
Analyzing motion patterns in traffic videos can be exploited directly to generate high-level descriptions of the video contents. Such descriptions may further be employed in different traffic applications such as traffic phase detection and abnormal event detection. One of the most recent and successful unsupervised methods for complex traffic scene analysis is based on topic models. In this pa...
متن کاملLearning Phrase Boundaries for Hierarchical Phrase-based Translation
Hierarchical phrase-based models provide a powerful mechanism to capture non-local phrase reorderings for statistical machine translation (SMT). However, many phrase reorderings are arbitrary because the models are weak on determining phrase boundaries for patternmatching. This paper presents a novel approach to learn phrase boundaries directly from word-aligned corpus without using any syntact...
متن کاملTowards Bidirectional Hierarchical Representations for Attention-based Neural Machine Translation
This paper proposes a hierarchical attentional neural translation model which focuses on enhancing source-side hierarchical representations by covering both local and global semantic information using a bidirectional tree-based encoder. To maximize the predictive likelihood of target words, a weighted variant of an attention mechanism is used to balance the attentive information between lexical...
متن کامل